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ABSTRACT 

In the process of developing a 
conditionally-dependent item response theory (IRT) model, the problem 
arose of modeling an underlying multivariate normal (MVN) response 
process with general correlation among the items. Without the 
assumption of condi\:ional independence, for vdiich the underlying MVN 
cdf takes on comparatively simple forms and can be numerically 
evaluated using existing reduction formulae, the tasJc required the 
developipent of a computationally fast, tractable, cind accurate 
approximation of MVN orthant probabilities for general 
correlation — "rho(sub ij)". Previous technical reports by the present 
authors have provided such a method, based on E. ClarJc's (1961) 
approximation 0-*" the moments of "n*' correlated random normal 
variables. Reseai'^h continues in the area of applying this algorithm 
to problems in IRT. This report focuses on the application of 
previous research results to another problem in statistics — the 
generation of simultaneous confidence bounds for multiple correlated 
comparisons. C. w. Dunnett's test .or multiple treatments compared to 
a single control is ganeralized to vacious unbalanced cases. There is 
a large amount of statistical literature on this topic. However, as 
in iRTf the solutions have been based on reduction formulae that are 
-imiced to special cases, which arise in the comparison of multiple 
treatment groups each of size "n(sub i) = m** to a single control 
group of size "n(sub 0)". More general problems, such as obtaining 
simultaneous confidence bounds for regression coefficients cannot be 
solved using these existing methods. This report illustrates how the 
results obtained in the IRT context can be applied to simultaneous 
statistical inference problems of various kinds. (Author/RLC) 
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ABSTRACT 



In the process of developing a conditionally dependent item-response the- 
ory model, we were confronted with the problem of modeling an underlying 
multivariate normal (MVN) response process with general correlation among 
the items. Without the assumption of conditional independence, for which 
the underlying MVN cdf takes on comparatively simple forms, and can be nu- 
merically evaluated using existing reduction formulae, our task required the 
development of a computationally fast, tractable and accurate approximation 
of multivariate normal orthant probabilities for general correlation {pij}. The 
focus of our previous technical reports have provided such a method, based on 
Clark's (1961) approximation of the moments of n correlated random normal 
variables. The major thrust of our work continues in the area of applying this 
algorithm to problems in item-response theory (IRT). The focus of this report, 
however, is on the application of our previous results to another problem in 
statistics; namely, the generation of simultaneous confidence bounds for multi- 
ple correlated comparisons. There is a large statistical literature on this topic, 
however, as in IRT, the solutions have been based on reduction formulae which 
limits their application to special cases (e.y., equa-correlation), which arises in 
the comparison of multiple treatment groups each of size = m to a single 
control of size uq. More general problems, such as, obtaining simultaneous con- 
fidence bounds for regression coefficients cannot be solved using these existing 
methods. In this report we illustrate how the results we have obtained in the 
IRT context can be applied to simultaneous statistical inference problems of 
various kinds. 
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1 Introduction 



If the random variables Xj, . . . ,Xp follow the multivariate normal (MVN) dis- 
tribution with means zero, common variance a^^ and a correlation matrix 
^ = [/'oil and if [vs^)la^ is aL independent variable with v degrees of free- 
dom, then the random vector t = (fj, . . . Jp), where U = Xifs (for i = I, . . . ,p) 
is said to have the p-variate ^-distribution with v degrees of freedom. This 
distribution is the multivariate analogue of Student's t with density function: 

^<'"'^ '''= (LV/3r(j2) "4''"""""^"'- <" 

The distribution has applications in a number of statistical problems, most 
notably in the multiple comparison of several treatments with a control (Dun- 
nett, 1955), and as John (1961) has noted, in the construction of simultaneous 
confidence bounds for the parameters in a linear model. We will discuss these 
applications of the multivariate-^ distribution and suggest a numerical method 
for evaluating the probabilities associated with this distribution. 



2 Dunnett's Test 

Consider the problem of comparing each of p treatments with a control in 

respect to their means /io^/^i./ij, A^p. vhere designates the control and 

-^t^ ^ = 1/- V> the treatments. Assume that the observations are normally 

and independently distributed with common within-group standard deviation 
<r. In this case. Dunnett (1955) has provided a procedure for making confi- 
dence statements about the p differences /i, ^ ii^^ such that the probability of 
all p statements being simultaneously correct is equal to a specified P level. 
Dennett's procedure and the ass>cciated tables are available for the case of 
equal sample sizes in all groups. Here, we will expand the procedure to the 
case where the sample sizes are not equal, and to an even more general class 
of problems involving simultemeous statistical inference. 

Suppose that there are no observations for the control, ni observations for 
the first treatment, . . . , rip observations for the p-th treatment, and denote 
these observations by X^j (i_= 0, 1 p; j = 1/2 n.) and the correspond- 
ing z-th treatment meai as X,. Assume that there is an estimate of available 
(denoted s^) based on u degrees of freedom, which is independent of the esti- 
mator of the mean. Now let 



and let U = Zi/s for i = l,2,...>p. As Dunnett (1955) n les, the lower 
confidence linuts with joint confidence coefficient P for the p treatmen* effects 
fii - /io are given by 



X^.Xo-dJ'^, (3) 

V ^OH. 

if the p constants t/| are chosen so that 

Prob(i, < diJ2 <d2 < dj,) ^ P. (4) 

To find the p constants d^ that satisfy these equations, the joint distribu- 
tion of the ti is required, which is the multivariate analogue of Student's t- 
distribution defined by Dunnett and Sobel (1955). Dunnett (1955) has shown 
how the problem of evaluating the multivariate ^-distribution can be reduced 
to the problem of evaluating the corresponding MVN distribution. For the 
latter, notice that the joint distribution of the r,- is a MVN distribution with 
means 0 and variances cr'. The ccJivelation between and Zj is given by: 



p.; = 1/ 




(5) 



which for the special case of equal sample sizes equals 1/2 for all i and j. 
Dunnett and Sobel note ^hat the joint probability statement given above can 
be written in the following way: 

P = Prob(ii < c/i.<2 < tp < dp) 

= Prob(ri < di$.Z2 < d2S z^ < dpS) 

+00 

F{d,S.d2S.....JpS)f{s)d6. (6) 



-oc 



where F(dxi>.d2S dps) is the MVN cdf of the r, and f(s) is the one- 
dimensional density function of ^. Thus, with probability values for F(-). the 
above equation can be evaluated using numerical integration over the distribu- 
tion of s. For this, note that the density function vof s is given by Pearson and 
Hartley (1976) as: 



1/ 



.Since s^/(7* — \^ ju we can rewrite the equation for P in terms of integration 
over the distribution oi u ^ s/cr (which is defined on 0 to +oc) as: 
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as 

P = I F(diu,d2U,...,dpu)f{u)y-du 
Jo du 

= F(<f,u,<fju rfpu)|j^^-p;75pYu''-'exp(-t/uV2)<fu. (8) 

Numerical integration over the distribution of c can then be performed to yif'ld 
the associated probability P for selected values of </, p, and «/. 



3 Some Special Cases 

Direct evaluation of the MVN cdf is not possible for p > 3. In the following, 
we note some special cases for which reduction formulae are available. 



3.1 Case 1: no = n,- = n. = 1 p) 

When all sample sizes are equal, the correlation in (5) is 1/2 for all possible 
pairings of the treatment groups and the control. Dunnett (1955) has given 
tables for the critical values of this distribution. In this case, the MVN prob- 
ability in (S) is simply 



Fp(0.0 0:{..5}) = 



1 



P+1 



(9) 



3.2 Case 2: uq = n and n, = m, (i = 1, . . . ,p) 

When the p treatment groups are each of size m, but the control group is of 
size n. where n ^ m, then from (5), p,j = p for all ij, and the the probability 
in (8) is 

fM^^ds ds:{p}) = fP^^i+^^ fiy)d{y), (10) 

where f{t) = exp{-\e)/{2z)"^ and F(t) = Sl'^f(t)dt, see Gupta (1963). 



3.3 Case 3: no - n and n, unequal 

When the treatment group sample sizes are unequal, the correlation matrix 
.{ptj} has the .special form 



Pij = QiOj 

for (i 5^ where -1 < a, < +1. In this case, the MVN cdf is: 



Fp{d\S,d2S dpS\ {p^J)) = 




(see Dunnett and Sobel. 1955). This MVN integral can be approximated to 
any practical degree of accuracy using Gauss-Hermite qaudrature (Stroud and 
Sechrest, 1966). 

4 The General Case 

The special cases in the previous sections provide methods for evaluating 
the MVN integral in (S). that cover all possible applications of the Dunnett 
type multiple conaparison with control procedure, regardless of the sample 
sizes of the various groups. Nevertheless, there are still situations in which 
a completely general solution is required. Of course, for the general case, a 
more general method for evaluating the probabilities of the MVN cdf F(-) 

is needed. For example, in regression analysis the (61.63 bp) are MVN 

with means (i:. . . . ,/3p) and variance covariance matrix {cij}<t^ = 5"^<t^, 

where S^J = E;=i(xir - i,){xjr - ij) for = 1.2 p). In this case. 

{Ptj} = (c„Cj;)"^^^c,; and none of the previous reduction formulae apply. One 
computationally tractable possibility is to use Clark's (1961) formulae for the 
n: . nents of the maximum of p correlated normal variables as applied by Gib- 
bons eL a/.. (19S1) to the proble*n of approximating MVN orthant probabil- 
ities. A brief description of this approximation is now provided, and we will 
show that these probabilities are sufficiently accurate for practical purposes. 

4.1 The: Clark Algorithm 

We begin by noting that the MVN cdf Fp(dx f , d2S d^s: {p^j } ) can be written 

as: 
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where hi = If Ai ...An = A = 0, and the z,- follow a standyrdized MVN 
distribution, is a so-called "orthant" probability. However, note that we 
can also write this MVN probability as: 



F^ = Pr{max(xi i„) < 0} 



(13) 



If ma j(x I, . . . , j„) were normally distributed, which it clearly is not. with mean 
E[max(x and variance V[max(xi then, 



E{max(xi j„) - h 



^V(max(xi In) 



(i4) 



where in this case h = 0. For general /i,. we would set /i = 0 and subtract h, 
from the mean of x,. 

In order to proceed, we need the first two moments of max(x„ . . .x„) where 
the X, have a joint MVN distribution with general correlation {/>,j}, and some 

bound on the error introduced by assuming that max{x i„) has a normal 

distribution. Clark (1961), has provided an appro.ximation for the first four 
moments of the maximum of p jointly normal correlated random variables, and 
Gibbons et. ai. (1990). have shown that the accuracy of the approximation is 
approximately 10"^ in proolems of this kind. An overview of the approximation 
is provided in the following. 

Let any three successive components from an p-variate vector, y,, be dis- 
tributed: 













!/.+! 



















Let = max(i/,) = y,. and compute the probability that > y, as 
follows: 



set 

where 



-.+1 = (/^. ~/'.+i)/C.+i- 



Then P(i/.+, > y) ^ P(y,+ i - y > 0) 
= *{---.+.) 

the value of the univariate normal distribution function at the standard deviate 
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Now let Vi+i = max(y„ ) and assume (as an approximation) that (yi+j, y,+i ) 
is bivariate normal with means, 

/x(y,+i) = 5(y,+i) = /i,*(Ji+i) + /<,+i*(-;,+i) +Ci+i^(^i+i). 
variances 



= ^(y?+2)-^^(y.+2) = <y?+2' 
<tU>i) = ^(y-+.)-i'U+i). (16) 

where 

* 

and correlation 



(17) 



p(y,+i,y.+2) = -r: — ^ • 

Then. 

P(y.+2 = max((/„y,+i.y.+2)) = f ((y.+2 -y.+i > 0) n (y,>2 - y.- > 0)) (19) 
is approximated by 



^(y.+2 > y.+i) = ^(yi+2 - h+\ > O) 

/i.+2 -/'(y.+i) 



= ^ 



\/(T?+2 + <T2(y,+i) - •2<T,+2<7(yi+i)p(yi+iiy.+2) 



) 



(20) 



Assuming as a working approximation that y,+i is normally distributed 
with the above mean and variance, we may therefore proceed, recursively from 
i =. 1 to t = p — 1, where yp+i is an independent dummy variate with mean 
zero and variance zero (i.e. yp+i =0). Then, for example, 

P{yp+i = max(yv, y2,-. ... , yp+i )] 
= P ((yp+i - yi > 0) n (yp+, - y2 > 0) n . . . n (yp+, - yp > 0)] 

= -Pl(-y.>0)n(-y2>0)n...n(-yp>0)] (21) 
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approximates the negative orthant. The probability of any other orthant can 
be obtained by reversing ihc signs of the variates corresponding to Vs in the 
orthant pattem. 

More generally, to compute any MVN orthant probability, for example, 

/-I /-I • • • /I ■^^''^ ' ^ • • • (22) 

we compute the negative orthant 'etting /<p+, ^ h. F .lally, to approximate the 
integral for general A,, we compute the negative orthant by seiting /<p+j = 0 
and fti=ft,-h,. In the present context hi = d,s 

4.2 Applications 

To illustrate the use- the general approximation, consider the following two 
multivariate prediction problems. 

4.2-1 Confidence Bounds for Means 

Simultaneous confidence bounds for the means of correlated normal variables 
can now be found using the general method. Suppose Zi,...,Xp are MVN 
with mean vec'or /ip . . . .^Zp and dispersion matrix r^{pij}, where {py} is the 
correlation matrix. The Clark algorithm can be used to satisfy the inequality, 

X, - .\~ids < fi, < X, + y-ids (23) 

for U = I p). 

4.2.2 Confidence Bounds for a Future Observation 

Similarly, simultaneous confidence bounds for a future p-vaciate observation 
may also be found in this way. Suppose j, Xp represent a future observa- 
tion vector from a MVN population with equal variances and correlation matrix 
{p,j}- previous sample of size N is available from which the estimates f and 
5 are obtained. The Clark algorithm can be used to satisfy the inequality, 

Xi - (1 -I- l/y rUs < X, < X, + (1 + l/N)-Us (24) 

for (' = 1 P)- The value h = ds is selected, such that the desired confidence 

level P in (6) is obtained. 



S 
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Table 1 

95% Critical Values for Various Modifications of Dunnetl's Test 















/ 


II 


III 


IV 


Case 


"0 


"1 








- p = .b, 


{Pii = P) 


{Pii = Qf.Oj) 




I 


10 


10 


10 


10 


10 


2.22 


2.'>2 


2.22 


2.20 


I 


20 


20 


20 


20 


20 


2.19 


2.19 


2.19 


2.17 


2 


10 


20 


20 


20 


20 


2.19 


2.13 


2.13 


2.12 


2 


10 


30 


30 


30 


30 


2. IS 


2.09 


2.09 


2.07 


2 


20 


10 


10 


10 


10 


2.20 


2.25 


2.25 


2.22 


3 


10 


30 


50 


20 


10 


2.19 


2. 10 


2.11 


2.10 


3 


10 


5 


50 


10 


50 


2.13 


2.09 


2.13 


2.13 



/ Dunnett's original tjst (correct for case 1 only) 

// Ho controls and m = :}\ =.■•■.= treatments (correct for cases 1 & 2) 

/// All n, potentially different (I dimensional quadrature: correct for cases 1,2,3) 

/V* General {/t^j} (Clark Approximate for all cases) 



5 Illustrations 

Table 1 presents coniparisons of various modifications of Dunnett's test for 
various sample size combinations ^:^r a 5 group study. 

Inspection of Table 1 reveals that all three of the reduction formulae work 
exactly as anticipated. The general solution based on the Clark approximation 
performs quite well, and if anything, it's accuracy is best in those caies when 
it is most needed, i.e.. when the correlations are heterogeneous. Dunnett's 
original tabled values (i.e., case I), appear to overestimate the tr-.e values 
when no < n, and underestimate the true values when r?o < n,. In geneial, the 
case II solution (i.e., = p) works reasonably well under all conditions; 

however, it is somewhat biased in thv* final example in which the sample sizes 
are quite variable. 

.\s a second numerical example, let us return to the problem of obtain- 
ing simultaneous confidence bounds for regression coefficients. Mosteller and 
Tukey (1977, pages 549-551) recovered demographic transition data on fertility 
rates and five socioeconomic indicators from 47 Swiss provinces in ISSS. The 
socioeconomic indicators were: 

1. Pr'^portion of population involved in agriculture as an occupation. 

2. Proportion of draftees receiving highest mark on army examination. 

9 
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3. Proportion of population whose education is beyond primary school. 

4. Proportion of the population who are catholic. 

5. Infant mortality: proportion of live births who live less than 1 year. 

The common standardized fertility measure /g was used as the dependent 
measure and the socioeconomic indicators Xi,. . . .Xj were the predictors. The 
least squares estimated regression equation was: 

4 = .645 - .203x1 - .295x2 - .896x3 + .OOIX4 + 1.316x5 

This regression equation reveals that fertility is inversely related to socioe- 
conomic status, which is consistent with the fact that at the time, fertil- 
ity was beginning to fall from the high level generally found in underdevel- 
oped countries to the lower level that it has today. The correlation matrix 
{ptj} = {CuCjj)'^^^c,j of the (61. 62.... 65) was: 



{Pw} = 



l.OO 

.21 l.OO 

.39 -.59 l.OO 

-.26 .55 -.47 1.00 

.17 -.03 .15 -.17 l.OO 



and the unbiased moment estimator of <r^, 
1 



^2 _ 



\-p-\ 



where 



■p) 



and 



(25) 



(26) 



(2T) 



was 5' = .0045. The elements c„.i = 1.2,..../) of S"' were cu = .96, 
C22 = il.89, C3.-K = 6.55. c« = .000024, and cjj = .30.58. Using the general 
approximation, we find that the inequalities 



b, - dc^,s <3,<b,+ dels (i - I,::. 



■P). 



(28) 



are simultaneously satisfied for P = .95 when d = 2.32. which yields the 
confidence limits: 
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13 



-.36 


< 


01 


< 


-.051 


-.83 


< 




< 


.24 


-1.29 


< 




< 


-.50 


.00024 


< 




< 


.0018 


.46 


< 


^5 


< 


2-18 



The confidence limit for 62 (i.e., proportion of draftees receiving highest marks 
on army examination,, was the only interval that included 5 = 0. For a single 
interval d = ^41 .05 = 2.02, which is considerably smaller than the simultaneous 
value of = 2.32 used here. Had we used a simple Bonferroni type adjustment 
(i.e.. a rz .05/5 = .01), then d = t4i..oi = 2.70, which would clearly have been 
overly conservative. 



6 Summary 

In this paper we have provided methods for evaluating the multivariate t- 
distribution with and without restrictions on the form of the correlation matrix 
{p,,}. I sing thes<^ results, Dunnett's test for multiple treatments compared to 
a single contrr* vas then generalized to various unbalanced cases. In the more 
general case, in which {/>,^} does not have a simple unidimensional form, we 
have applied Clark's approximation to the moments of the maximum of n cor- 
related random normal variables to the problem of approximating the required 
MVN cdf. This approach appears to work well, and is the only computation- 
ally tractable solution for the case of general {/>ij}. Application to the problem 
of obtaining simultaneous confidence limits for regression coefficients, clearly 
illustrates the importance of this approach, given that repeated use of limits 
designed for a single comparison yield inadequate coverage, and simplistic ad- 
justments that do not take the correlational structure into consideration, {e.g., 
Bonferroni adjusted a' = a/p). yield limits that are overly conser\ative. 
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